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Summary 

This study explores the roles of genome copy number abnormalities (CNAs) in breast cancer pathophysiology by identifying 
associations between recurrent CNAs, gene expression, and clinical outcome in a set of aggressively treated early-stage 
breast tumors. It shows that the recurrent CNAs differ between tumor subtypes defined by expression pattern and that strat- 
ification of patients according to outcome can be improved by measuring both expression and copy number, especially high- 
level amplification. Sixty-six genes deregulated by the high-level amplifications are potential therapeutic targets. Nine of 
these {FGFR1, IKBKB, ERBB2, PROCC, ADAM9, FNTA, ACACA, PNMT, and NR1D1) are considered druggable. Low-level 
CNAs appear to contribute to cancer progression by altering RNA and cellular metabolism. 



Introduction 

It is now well established that breast cancers progress through 
accumulation of genomic (Albertson et al., 2003; Knuutila et al., 
2000) and epigenomic (Baylin and Herman, 2000; Jones, 2005) 
aberrations that enable the development of aspects of cancer 
pathophysiology such as reduced apoptosis, unchecked prolif- 
eration, increased motility, and increased angiogenesis (Hana- 
han and Weinberg, 2000). Discovery of the genes that contribute 
to these pathophysiologies when deregulated by recurrent ab- 
errations is important to understanding mechanisms of cancer 
formation and progression and to guide improvements in cancer 
diagnosis and treatment. 

Analyses of expression profiles have been particularly power- 
ful in identifying distinctive breast cancer subsets that differ in 
biological characteristics and clinical outcome (Perou et al., 
1999, 2000; Sortie et al., 2001, 2003). For example, unsuper- 
vised hierarchical clustering of microarray-derived expression 



data has identified intrinsically variable gene sets that distin- 
guish five breast cancer subtypes— basal-like, luminal A, luminal 
B, ERBB2, and normal breast-like. The basal-like and ERBB2 
subtypes have been associated with strongly reduced survival 
durations in patients treated with surgery plus radiation (Perou 
et al., 2000; Sorlie et al., 2001), and some studies have sug- 
gested that reduced survival duration in poorly performing sub- 
types is caused by an inherently high propensity to metastasize 
(Ramaswamy et al., 2003). These analyses already have led to 
the development of multigene assays that stratify patients into 
groups that can be offered treatment strategies based on risk 
of progression {Esteva et al., 2005; Gianni et al., 2005; van 't 
Veer et al., 2002; van de Vijver et al., 2002). However, the predic- 
tive power of these assays is still not as high as desired, and the 
assays have not been fully tested in patient populations treated 
with aggressive adjuvant chemotherapies. 

Analyses of breast tumors using fluorescence in situ hybrid- 
ization (Al-Kuraya et al., 2004; Kallioniemi et al., 1992; Press 
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et aL, 2005; Tanner et al., 1994) and comparative genomic 
hybridization (Kallioniemi et al., 1994; Loo et al., 2004; Naylor 
et al., 2005; Pollack et al., 1999) show that breast tumors also 
display a number of recurrent genome copy number aberra- 
tions, including regions of high-level amplification that have 
been associated with adverse outcome (Al-Kuraya et al., 2004; 
Cheng et al., 2004; Isola et al., 1995; Jain et al., 2001; Press 
et al., 2005). This raises the possibility of improved patient strat- 
ification through combined analysis of gene expression and 
genome copy number (Barlund et al., 2000; Pollack et al., 
2002; Ray et al., 2004; Yi et al., 2005). In addition, several studies 
of specific chromosomal regions of recurrent abnormality at 
17q12 (Kauraniemi et al., 2001, 2003) and 8p11 (Gelsi-Boyer 
et al., 2005; Ray et al., 2004) show the value of combined anal- 
ysis of genome copy number and gene expression for identifica- 
tion of genes that contribute to breast cancer pathophysiology 
by deregulating gene expression. 

We have extended these studies by performing combined 
analyses of genome copy number and gene expression to iden- 
tify genes that contribute to breast cancer pathophysiology, with 
emphasis on those that are associated with poor response to 
current therapies. By associating clinical endpoints with genome 
copy number and gene expression, we showed strong associa- 
tions between expression subtype and genome aberration com- 
position, and we identified four regions of recurrent amplification 
associated with poor outcome in treated patients. Gene expres- 
sion profiling revealed 66 genes in these regions of amplification 
whose expression levels were deregulated by the high-level 
amplifications. We also found a surprising association between 
low-level CNAs and upregulation of genes associated with 
RNA and protein metabolism that may suggest a mechanism 
by which these aberrations contribute to cancer progression. 

Results 

We assessed genome copy number using BAC array CGH 
(Hodgson et al., 2001; Pinkel et al.; 1998; Snijders et aL, 2001; 
Solinas-Toldo et al., 1997) and gene expression profiles using 
Affymetrix U133A arrays (Ramaswamy et al., 2003; Reyal 
et al., 2005) in breast tumors from a cohort of patients treated 
according to the standard of care between 1989 and 1997 (sur- 
gery, radiation, hormonal therapy, and treatment with high-dose 
adriamycin and Cytoxan as indicated). We measured genome 
copy number profiles for 145 primary breast tumors and gene 
expression profiles for 130 primary tumors, of which 101 were 
in common. We analyzed these data to identify recurrent geno- 
mic and transcriptional abnormalities, and we assessed associ- 
ations with clinical endpoints to identify genomic events that 
might contribute to cancer pathophysiology. • 

Molecular characteristics and associations 
Genome copy number and gene expression features 
We found that the recurrent genome copy number and gene 
expression characteristics measured for the patient cohort in 
this study were similar to those reported in earlier studies. We 
summarize these briefly. 

Figures 1A and 1B show numerous regions of recurrent ge- 
nome CNAand nine regions of recurrent high-level amplification 
involving regions of chromosomes 8, 11, 12, 17, and 20, while 
Figure 2 shows that analysis of these data using unsupervised 
hierarchical clustering resolves these tumors into the "1q/16q" 



(or "simple"), "complex," and "amplifier" genome aberration 
subtypes (Fridlyand et al., 2006). The genomic extents of the re- 
gions of amplification are listed in Table 1 . These were generally 
similar to those reported in earlier studies using chromosome 
(Kallioniemi et al., 1 994) and array CGH (Loo et al., 2004; Naylor 
et al., 2005; Pollack et aL, 1999, 2002). Several of these regions 
of amplification were frequently coamplified. Declaring a Fisher 
exact test p value of less than 0.05 for pairwise associations to 
be suggestive of possible significant coamplification, we found 
coamplification of 8q24 and 20q13 and coamplification of re- 
gions at 1 1 q 1 3-1 4, 1 2q 1 3-1 4, 1 7q 1 1 -1 2, and 1 7q21 -24. These 
analyses were underpowered to achieve significance with 
proper correction for multiple testing, so these associations are 
suggestive but not significant. However, these associations 
were consistent with the report of Al-Kuraya et al. (2004), who 
showed evidence for coamplification of genes in several of these 
regions of amplification including ERBB2, MYC, CCND1, and 
MDM2, and that of Naylor et al. (2005) showing coamplification 
of 17q12and17q25. 

Figure S1 (in the Supplemental Data available with this article 
online) shows that unsupervised hierarchical clustering of intrin- 
sically variable genes resolves the tumors in our study cohort into 
the luminal A, luminal B, basal-like, and ERBB2 expression sub- 
types previously reported for breast tumors (Perou et aL, 1999, 
2000; Sorlie et al., 2003). We assessed the genomic characteris- 
tics of these expression subtypes in subsequent analyses. 
Associations between CNAs and expression 
Combined analyses of genome copy number and expression 
showed that the recurrent genome CNAs differed between ex- 
pression subtypes and identified genes whose expression levels 
were significantly deregulated by the CNAs. Figures 1C-1J 
show the recurrent CNAs for each expression subtype. In these 
analyses, we assigned each tumor to the expression subtype 
cluster (basal-like, ERBB2, luminal A, and luminal B) to which 
its expression profile was most highly correlated. We did not 
assess aberrations in normal-like tumors due to the small num- 
ber of such tumors. Figure 1C shows that the basal-like tumors 
were relatively enriched for low-level copy number gains involv- 
ing 3q, 8q, and 10p and losses involving 3p, 4p, 4q, 5q, 12q, 
13q, 14q, and 15q, while Figure 1 D shows that high-level ampli- 
fication at any locus was infrequent in these tumors. Figure 1 E 
shows that ERBB2 tumors were relatively enriched for increased 
copy number at 1 q , 7p, 8q, 1 6p, and 20q and reduced copy num- 
ber at 1 p, 8p, 1 3q, and 1 8q. Figure 1 F shows that amplification of 
ERBB2 was highest in the ERBB2 subtype as expected, but 
amplification of noncontiguous, distal regions of 17q also was 
frequent as previously reported (Barlund et aL, 1997). Figure 1G 
shows that increased copy number at 1q and 16p and reduced 
copy number at 16q were the most frequent abnormalities in 
luminal A tumors, while Figure 1H shows that high-level amplifi- 
cations at 8p1 1 -1 2, 1 1 q1 3-1 4, 1 2q1 3-1 4, 1 7q1 1 -1 2, 1 7q2 1 -24, 
and 20q13 were relatively common in this subtype. Figure 11 
shows that gains of chromosomes 1q, 8q, 17q, and 20q and 
losses involving portions of 1p, 8p, 13q, 16q, 17p, and 22q 
were prevalent in luminal B tumors, while Figure 1 J shows that 
high-level amplifications involving 8p11-12, two regions of 8q, 
and 11q13-14 were frequent. Bergamaschi et aL (2006) have 
reported similar CNA patterns for the luminal A, luminal B, basal, 
and ERBB2 expression clusters. 

In order to understand how the genome aberrations influence 
cancer pathophysiologies, we identified genes that were 
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Figure 1. Recurrent abnormalities in 145 primary 
breast tumors 

A: Frequencies of genome copy number gain 
and loss plotted as a function of genome loco- 
tion with chromosomes Ipter to the left and 
chromosomes 22qter and X to the right. Vertical 
lines indicate chromosome boundaries, and 
vertical dashed lines indicate centromere loca- 
tions. Positive and negative values indicate 
frequencies of tumors showing copy number 
increases and decreases, respectively, with 
gain and loss as described in the Experimental 
Procedures. 

6: Frequencies of tumors showing high-level am- 
plification. Data are displayed as described in A. 
C-J: Frequencies of tumors showing significant 
copy number gains and losses as defined in A 
(upper member of each pair) or high-level am- 
plifications as defined in B [lower member of 
each pair) in tumor subtypes defined according 
to expression phenotype; C and D, basal-like; E 
and F, ERBB2; G and H, luminal A; I and J, luminal 
B. Data are displayed as described in A. 
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deregulated by recurrent genome CNAs. We took these genes 
to be those whose expression levels were significantly associ- 
ated with copy number (Holm-adjusted p value < 0.05). These 
genes, which represent about 10% of the genome interrogated 
by the Affymetrix HGU133A arrays used in this study, and their 
copy number-expression level correlation coefficients are listed 
in Table S3. This extent of genome-aberration-driven deregula- 
tion of gene expression is similar to that reported in earlier stud- 
ies (Hyman et al„ 2002; Pollack et al., 1999). We tested associ- 
ations between copy number and expression level for 1 86 genes 
in regions of amplification at 8p11-12, 11q13-q14, 1 7q1 1 -12, 
and 20q13 ( and we identified 66 genes in these regions whose 
expression levels were correlated with copy number (FDR < 
0.01, Wilcoxon rank-sum test; Table 3). These genes define 
the transcriptionally important extents of the regions of recurrent 
amplification. Twenty-three were from a 5.5 Mbp region at 
8p11-12 flanked by SPFH2 and LOC441347, ten were from 



a 6.6 Mbp region at 1 1q1 3-1 4 flanked by CCND1 and PRKRIR, 
nineteen were from a 3.1 Mbp region at 17q1 2 flanked by LHX1 
and NR1D1, and fourteen were from a 5.4 Mbp region at 20q13 
flanked by ZNF21 7 and C20orf45. 

Since the recurrent genome aberrations differed between 
expression subtypes, we explored the extent to which the ex- 
pression subtypes were determined by genome copy number. 
Specifically, we applied unsupervised hierarchical clustering to 
intrinsically variable genes after removing genes whose expres- 
sion levels were correlated with copy number. Figure 4 shows 
that the tumors still resolve into the basal-like and luminal clas- 
ses. However, the ERBB2 cluster was lost. 

Associations with clinical variables 
Associations with histopathoiogy 

Figure 2 and Table 2 summarize associations of histopatholog- 
ic^ features with aspects of genome abnormality, including 
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Figure 2. Unsupervised hierarchical clustering of 
genome copy number profiles measured for 
1 45 primary breast tumors 

Green indicates increased genome copy num- 
ber, and red indicates decreased genome 
copy number. The three major genomic clusters 
from left to right are designated 1 q/t 6q, com- 
plex, and amplifying. The bar to the left indicates 
chromosome locations with chromosome 1 pter 
to the top and 22qter and X to the bottom. The 
locations of the odd-numbered chromosomes 
are indicated. The upper color bars indicate bio- 
logical and clinical aspects of the tumors. Color 
codes are indicated at the bottom of the figure. 
Dark blue indicates positive status, and light blue 
indicates negative status for node, ER, PR, and 
p53 expression. For Ki67, dark blue indicates frac- 
tion >0.1, and light blue indicates fraction <0.1. 
For size, light blue indicates size <2.2 cm, and 
dark blue indicates size >2.2 cm. Color codes 
for the expression bar are as follows: orange, 
luminal A; dark blue, normal breast-like; light 
blue, ERBB2; green, basal-like; yellow, luminal B. 



recurrent genome abnormalities, total number of copy number 
transitions, fraction of the genome altered (FGA), number of 
chromosomal arms containing at least one amplification, num- 
ber of recurrent amplicons, and presence of at least one recur- 
rent amplification. These analyses showed that ER/PR-negative 
tumors were predominantly found in the basal-like expression 
and "complex" genome aberration subtypes, respectively. 
Node-positive tumors had significantly more amplified arms 
and recurrent amplicons than node-negative samples but 
showed a much more moderate difference in terms of low-level 
copy number transitions. Stage 1 tumors had moderately fewer 
low- and high-level changes than higher-stage tumors. The 
number of low- and high-level abnormalities increased with 
SBR grade. Interestingly, the "complex" tumors showing 
many low-level abnormalities were more strongly associated 
with aberrant p53 expression than "amplifying" tumors. 



"Simple" tumors tended to have Ki67 proliferation indices 
<10%, while "complex" and "amplifying" tumors typically had 
Ki67 indices >10%. The number of amplifications increased sig- 
nificantly with tumor size, but the number of low-level changes 
did not. We observed no association of genomic changes with 
the age at diagnosis. 
Associations with outcome 

Figure 2 and Table S2 summarize associations between histo- 
pathological, transcriptional, and genomic characteristics and 
outcome endpoints identified using multivariate regression 
analysis. Histopathological features including size and nodal 
status were significantly associated with survival duration and/ 
or disease recurrence in univariate analyses (Table S1) and 
were included in the multivariate regressions described below. 

The tumor subtypes based on patterns of gene expression or 
genome aberration content showed moderate associations with 
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Table 1. Univariate and multivariate associations for individual amplicons and/or disease-specific survival and distant recurrence 



Amplicon 


Flanking 
clone (left) 


Flanking 
clone (right) 


Kb start 


Kb end 


p value, univariate 
survival recurrence 


p value, luminal A, univariate 
survival recurrence 


p value, multivariate 
survival recurrence 


0~. 1 1 1 O 

Op 1 1 - 1 Z 


DD 1 1 OCQfcJl 1 C 

Kr 1 1 -zoom 1 o 


DD 1 1 "71 k i 1 O 

Rrl l-AJMlr 


33579 


43001 


0.011 


0.004 


0.022 


0.004 


0.037 


0.006 


8q24 


RP1 1-65D17 


RP11-94M13 


127186 


132829 


0.830 


0.880 


0.140 


1.0 


0.870 


0.720 


1 1 q 1 3- 1 4 


CTD-2080119 


RP11-256P19 


68482 


71659 


0.540 


0.410 


0.016 


0.240 


0.660 


0,440 


1 1 q 1 3- 1 4 


RP1I-102M18 


RP11-215H8 


73337 


78686 


0.230 


0.150 


0.016 


0.240 


0.360 


0.190 


1 2q 1 3- 1 4 


BAL12B2624 


RP11-92P22 


67191 


74053 


0.250 


0.260 


0.230 


0.098 


0.920 


0.960 


17q1 1-12 


RP11-5808 


RP11-87N6 


34027 


38681 


0.004 


0.004 


1.0 


1.0 


0.022 


0.008 


l7q21-24 


RP11-234J24 


RP11-84E24 


45775 


70598 


0.960 


0.920 


0.610 


0.290 


0.530 


• 0.630 


20ql3 


RMC20B4135 


RP1 1-278113 


5)669 


53455 


0.340 


0.800 


0.048 


0.140 


0.590 


0.970 


20qt3 


GS-32119 


RP11-94A18 


55630 


59444 


0.087 


0.230 


0.048 


0.140 


0.060 


0.220 


Any amplicon 










0.005 


0.003 


0.024 


0.120 


0.034 


0.009 



Also shown are the chromosomal positions of the beginning and ends of the amplicons and the flanking clones. Associations are shown for the entire sample 
set and for luminal A tumors (univariate associations only). 



outcome endpoints. For example, Figure 3A shows that patients 
with tumors classified as ERBB2 based oh expression pattern 
had significantly shorter disease-specific survival than patients 
classified as luminal A or luminal B as previously reported (Perou 
et al. f 2000; Sorlie et al. f 2001). Unlike these earlier reports, 
patients with tumors classified as basal-like did not do signifi- 
cantly worse than patients with luminal or normal breast-like 
tumors, although there was a trend in that direction. In addition, 
Figure 3B indicates that patients with tumors classified as 
"1q/16q" based on genome aberration content tended to 
have longer disease-specific survival than patients with "com- 
plex" or "amplifier" tumors. 

We found that high-level amplification was most strongly as- 
sociated with poor outcome in this aggressively treated patient 
population. Amplification at any of the nine recurrent amplicons 
was an independent risk factor for reduced survival duration (p < 
0.04) and distant recurrence (p < 0.01) in a multivariate Cox-pro- 
portional model that included tumor size and nodal status. 
Figure 3C, for example, shows that patients whose tumors 
had at least one recurrent amplicon survived a significantly 
shorter time than did patients with tumors showing no amplifica- 
tions. More specifically, amplifications of 8p1 1-12 or 17q1 1-12 
(ERBB2) were significantly associated with disease-specific sur- 
vival and distant recurrence in all patients in multivariate regres- 
sions (Table 1). Importantly, we found that stratification accord- 
ing to amplification status allowed identification of patients with 
poor outcome even within an expression subtype. Figure 3D, for 



example, shows that patients with luminal A tumors and ampli- 
fication at 8p1 1 -1 2, 1 1 q1 3-1 4, or 20q 1 3 had significantly shorter 
disease-specific survival than patients without amplification in 
one of these regions (the number of samples in the luminal A 
subtype group was too small for multivariate regressions). 
Amplification at 8p1 1-12 was most strongly associated with dis- 
tant recurrence in the luminal A subtype. 

Considering the strong association between amplification and 
outcome, we explored the possibility that some of these genes 
were overexpressed in tumors in which they were not amplified 
and that overexpression was associated with reduced survival 
duration in those tumors. Increased expression levels of seven 
genes (see Table 3) were associated with reduced survival or 
distant recurrence at the p < 0.1 level, but only two, the growth 
factor receptor-binding protein GRB7 (17q) and the keratin-as- 
sociated protein KTRAP5-9 (1 1q), at the p < 0.05 level. Interest- 
ingly, this analysis also revealed an unexpected association 
between reduced expression levels of genes from regions of 
amplification and poor outcome (either disease-free survival or 
distant recurrence) in tumors without relevant amplifications 
(p < 0.05). This was especially prominent for genes from the re- 
gion of amplification at 8p11-12 (14 of 23 genes in this region 
showed this association), while only two genes from regions of 
adverse-outcome-associated amplifications on chromosomes 
17q and 20q showed this association. Following this lead, we 
tested associations between outcome and reduced copy num- 
ber at 8p11-12 in patients in tumors in which 8p11-12 was not 



Table 2. Associations of genomic variables with clinical features 





Fraction of 


Total number 


Number of 


Number of 


Presence of 




genome altered 1 


of transitions' 


amplified arms' 


recurrent amplicons 1 


recurrent amplicons 2 


1 . ER (negative versus positive) 


<0.001 


<0.001 


0.376 


0.147 


0.482 


2. PR (negative versus positive) 


0.005 


<0.001 


<0.050 


0.319 


0.390 


3. Nodes (positive versus negative) 


0.053 


0.106 


0.012 


0.012 


0.008 


4. Stage {>) versus 1) 


0.013 


0.052 


0.045 


0.312 


0.368 


5. ERBB2 (positive versus negative) 


0.650 


0.830 


0.015 


<0.001 


<0.001 


6, Ki67 (>0.1 versus <0.1) 


0.013 


0.031 


0.024 


0.010 


0.005 


7. PS3 (positive versus negative) 


0.001 


<0.001 


0.043 


0.573 


0.171 


8. Size 


0.339 


0.088 


0.016 


0.005 


0.015 


9. Age at px 


0767 


0.361 


0.223 


0.905 


0.947 


10. SBR grade 


<0.001 


<0.001 


0.008 


0.206 


0.035 


1 1 . Expression subtype 


<0.001 


<0.001 


0.002 


0.003 


<0.001 


1 2. Genomic subtype 


<0.001 


<0.001 


<0.001 


O.001 


<0.001 



'Kruskal-Wallis test (1-7, 11, and 12), significance of robust linear regression standardized coefficient (8-10). 
2 Fisher exact test (1-7, 1 1, and 12), significance of robust linear regression standardized coefficient (8-10). 
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Figure 3. Kaplan-Meyer plots showing survival in breast tumor subclasses 
A: Disease-specific survival in 1 30 breast cancer patients whose tumors were 
defined using expression profiling to be basal-like (green curve), luminal A 
(yellow curve), luminal B (orange curve], and ERBB2 (purple curve) class. 
B: Disease-specific survival of patients with tumors classified by genome 
copy number aberration analysis as lq/16q (green), complex (red), and 
amplifying (blue). 

C: Survival of patients with (red curve) and without (green curve) amplifica- 
tion at any region of recurrent amplification. 

D: Survival of patients whose tumors were defined using expression profiling 
to be luminal A tumors with (red curve) and without (green curve) amplifica- 
tion at8pll-12, Uql3, and/or 20q. 

E: Survival of patients whose tumors were not amplified at 8p1 1-12 and had 
normal (green curve) or reduced (red curve) genome copy number at 
8p1 1-12, 

F: Survival of patients whose tumors had normal (green curve) or abnormal 
(red curve) genome copy number at 8pl 1-12. 

amplified. Figure 3E shows that patients with reduced copy 
number at 8p11-12 did worse than patients without a deletion 
in this region. Figure 3F shows that patients in the overall study 
with high-level amplification or deletion at 8p11-12 survived 
significantly shorter survival (p = 0.001 7) than patients without 
either of those events. 

We also tested for associations of low-level genome copy 
number changes with the outcome endpoints. The most frequent 
low-level copy number changes (e.g., increased copy number at 
1q, 8q, and 20q or decreased copy number at 1 6q) were not sig- 
nificantly associated with outcome endpoints. However, we did 
find a significant association of the loss of a small region on 9q22 
with adverse outcome, both disease-specific survival and distal 
recurrence, which persisted even after correction for multiple 
testing (p < 0.05, multivariate Cox regression). This region is de- 
fined by BACs, CTB-172A10, and RP11-80F13. We also found 
a marginally significant association between fraction of the 
genome lost and disease-specific survival in luminal A tumors 
(p < 0.02 and < 0.06 for univariate and multivariate regression, 
respectively, Cox-proportional regression). 

We used the program GoStat (Beissbarth and Speed, 2004) to 
identify the Gene Ontology (GO) classes of 1444 unique genes 



(1 734 probe sets) whose expression levels were preferentially 
modulated by low-level CNAs compared to 3026 probe sets 
whose expression levels did not show associations with copy 
number. The GO categories most significantly overrepresented 
in the set of genes with a dosage effect compared to genes with 
no or minimal dosage effect involved RNA processing (Holm ad- 
justed p value < 0.001), RNA metabolism (p < 0.01), and cellular 
metabolism (p < 0.02). 

Discussion 

This paper describes a comprehensive analysis of gene expres- 
sion and genome copy number in aggressively treated primary 
human breast cancers performed in order to (1) identify genomic 
events that can be assayed to better stratify patients according 
to clinical behavior, (2) develop insights into how molecular ab- 
errations contribute to breast cancer pathogenesis, and (3) dis- 
cover genes that might be therapeutic targets in patients that do 
not respond well to current therapies. An accompanying paper 
in this issue of Cancer Cell shows that many of these aberrations 
are found in subsets of breast cancer cell lines that can be ma- 
nipulated to confirm functions suggested by associations with 
pathophysiology established here (Neve et al., 2006). 

Molecular markers that predict outcome 

Our combined analyses of genome copy number and gene ex- 
pression focused on tumors from patients treated more aggres- 
sively than those in previously published studies (Perou et a!., 
2000; Sorlie et al., 2001) (i.e., with surgery, radiation of the sur- 
gical margins, hormonal therapy for ER-positive disease, and 
aggressive adjuvant chemotherapy as indicated) and revealed 
two important associations. 

First, they showed that the survival of patients with tumors 
classified as basal-like according to expression pattern did not 
have significantly worse outcome than patients with luminal or 
normal-like tumors in this tumor set, unlike previous reports 
(Perou et al., 2000; Sorlie et al., 2001) (see Figure 3A), although 
there was a trend toward lower survival. However, patients 
with ERBB2-positive tumors did have significantly increased 
death from disease and shorter recurrence-free survival in ac- 
cordance with the earlier studies. This may indicate that the ag- 
gressive chemotherapy employed for treatment of the predomi- 
nantly ER-negative basal-like tumors increased survival duration 
in these patients relative to patients with tumors in the other sub- 
' groups. Thus,' outcome for patients with basal-like tumors may 
not be as bad as indicated by earlier prognostic studies of patient 
populations that did not receive aggressive chemotherapy for 
progressive disease. Alternately, the differences may be due to 
differences in cohort selection. In either case, this result empha- 
sizes the need to interpret the performance of molecular markers 
for patient stratification in the context of specific treatment reg- 
imens and in molecularly defined cohorts. 

Second, we found that aggressively treated patients with high- 
level amplification had worse outcome than did patients without 
amplification (see Figure 3C). This is consistent with earlier CGH 
and single-locus analyses of associations of amplification with 
poor prognosis (Al-Kuraya et al., 2004; Blegen et al., 2003; 
Callagy et al., 2005; Gelsi-Boyer et al., 2005; Weber-Mangal 
et al., 2003). Moreover, the presence of high-level amplification 
was an indicator of poor outcome, even within patient subsets 
defined by expression profiling. This was particularly apparent 
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for luminal A tumors, as illustrated in Figure 3D, where patients 
whose tumors had high-level amplification at 8p11-12, 11q13- 
14, or 20q13 did significantly worse than patients without am- 
plification. This suggests that stratification according to both 
expression level and copy number will identify patients that 
respond poorly to current therapeutic treatment strategies. 

Mechanisms of disease progression 

Our combined analyses of genome copy number and gene ex- 
pression showed substantial differences in recurrent genome 
abnormality composition between tumors classified according 
to expression pattern and revealed that over 10% of the genes 
interrogated in this study had expression levels that were highly 
significantly associated with genome copy number changes. 
Most of the gene expression changes were associated with 
low-level changes in genome copy number, but 66 were deregu- 
lated by the high-level amplifications associated with poor 
outcome. These analyses provide insights into the etiology of 
breast cancer subtypes, suggest mechanisms by which the 
low-level copy number changes contribute to cancer patho- 
genesis, and identify a suite of genes that contribute to cancer 
pathophysiology. 
Breast cancer subtypes 

Figures 1 and 2 show that recurrent genome copy number aber- 
rations differ substantially between tumors classified according 
to expression pattern as described previously (Perou et al., 
1999). This is consistent with a model of cancer progression in 
which the expression subtype and genotype are determined 
by the cell type and stage of differentiation that survives telo- 
mere crisis and acquires sufficient proliferative advantage to 
achieve clonal dominance in the tumor (Chin et al., 2004). This 
model suggests that the genome CNA spectrum is selected to 
be most advantageous to the progression of the specific cell 
type that achieves immortality and clonal dominance. In this 
model, the recurrent genome CNA composition can be consid- 
ered an independent subtype descriptor— much as genome 
CNA composition can be considered to be a cancer type de- 
scriptor (Knuutila et al., 2000). The independence of the genome 
CNA composition and basal and luminal expression subtypes is 
clear from Figure 4, which shows that the breast tumors divide 
into basal and luminal subtypes using unsupervised hierarchical 
clustering even after all transcripts showing associations with 
copy number are removed from the data set. Of course, the 
ERBB2 subtype is lost, since that subtype is strongly driven 
by ERBB2 amplification. 
Low-level abnormalities 

The most frequent low-level copy number changes were not as- 
sociated with reduced survival duration, although some were 
associated with other markers usually associated with survival 
such as tumor size, nodal status, and grade (see Table 2). This 
raises the question of why the recurrent low-level CNAs are se- 
lected. GOstat analyses of the genes deregulated by these ab- 
normalities showed that numerous genes involved in RNA and 
cellular metabolism were significantly upregulated by these 
events. Interestingly, we found these same GO classes to be 
significantly altered in a collection of breast cancer cell lines 
and in a study of ovarian cancer (W.-LK., unpublished data). 
We also observed that many of the recurrent low-level aberra- 
tions matched the low-level copy number changes in the 
ZNF217-transfected human mammary epithelial cells that 
emerged after passage through telomere crisis having achieved 



clonal dominance in the culture (Chin et al., 2004; see Fig- 
ure S2)— presumably because the aberrations they carried con- 
ferred a proliferative advantage. This suggests to us that the 
low-level CNAs are selected during early cancer formation be- 
cause they increase basal metabolism, thereby providing a net 
survival/proliferative advantage to the cells that carry them. 
This idea is supported by a report that some of these same clas- 
ses of genes were associated with proliferative fitness yeast 
(Deutschbauer et al., 2005). That study described analyses of 
proliferative fitness in the complete set of Saccharomyces cere- 
visiae heterozygous deletion strains and reported reduced 
growth rates for strains carrying deletions in genes involved in 
RNA metabolism and ribosome biogenesis and assembly. 
High-level amplification 

We found that high-level amplifications were associated with re- 
duced survival duration and/or distant recurrence overall and 
within the luminal A expression subgroup. We identified 66 
genes in these regions whose expression levels were correlated 
with copy number. GO analyses of those genes showed that 
they are involved in aspects of nucleic acid metabolism, protein 
modification, signaling, and the ceil cycle and/or protein trans- 
port, and evidence is mounting that many if not most of these 
genes are functionally important in the cancers in which they 
are amplified and overexpressed (see Table 3). Indeed, 
published functional studies in model systems already have im- 
plicated eleven of these genes in diverse aspects of cancer 
pathophysiology. Six of these are encoded in the region of am- 
plification at 8p11. These encode the RNA-binding protein 
LSM1 (Fraser et al., 2005), the receptor tyrosine kinase FGFR1 
(Braun and Shannon, 2004), the cell-cycle-regulatory protein 
TACC1 (Still et al., 1999), the metalloproteinase ADAM9 (Maz- 
zocca et a!., 2005), the serine/threonine kinase IKBKB (Greten 
and Karin, 2004; Lam et al., 2005), and the DNA polymerase 
POLB (Clairmont et al., 1999). Functionally validated genes in 
the region of amplification at 1 1q13 include the cell-cycle-regu- 
latory protein CCND1 (Hinds et al., 1994) and the growth factor 
FGF3 (Okunieff et al., 2003). Functionally important genes in the 
region of amplification at 1 7q include the transcription regulation 
protein PPARBP (Zhu et al., 2000), the receptor tyrosine kinase 
ERBB2 (Slamon et al., 1989), and the adaptor protein GRB7 
(Tanaka et al., 2000), while the AKT-pathway-associated tran- 
scription factor ZNF217 (Huang et al., 2005; Nonet et al., 
2001) and the RNA-binding protein REA1 (Babu et ai., 2003) 
are functionally validated genes encoded in the region of ampli- 
fication at 20q13. Further support for the functional importance 
of seven of these genes (TACC1, ADAM9, IKBKB, POLB, 
CCND1, GRB7, and ZNF217) in oncogenesis comes from the 
observation that they are within 1 00 Kbp of sites of recurrent tu- 
morigenic viral integration in the mouse (Akagi et al., 2004), and 
three (IKBKB, CCND1, and GRB7) are within 10 Kbp of such 
a site. Taking proximity to a site of recurrent tumorigenic viral in- 
tegration as evidence for a role in cancer genesis implicates an 
additional 13 genes or transcripts (see Table 3). 

The biological roles of the genes deregulated by recurrent 
high-level amplification are diverse and vary between regions 
of amplification. For example, genes deregulated by amplifica- 
tion at 11q13 and 17q1 1-12 predominantly involved signaling 
and cell cycle regulation, while genes deregulated by amplifica- 
tion at 8p11-12 and 20q13 were of mixed function but were 
associated most frequently with aspects of nucleic acid metab- 
olism. The predominance of genes involved in nucleic acid 
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Table 3. Functional characteristics of genes in recurrent amplicons associated with reduced survival duration in breast cancer 











p value, 




















disease 


p value, 






Kbp to site 










p value, 


free 


distant 




Cancer function 


of viral 




Gene 


Ch 


Mbp 


amplification 


survival 


recurrence 


Transcript description 


reference 


integration 


Druggable? 


SPFH2** 


8 


37.6 


7.08E-07 


0.053 


0.003 


chromosome 8 open reading 
frame 2 








PROSC** 


8 


37.7 


2.28E-05 


0.390 


0.043 


racemose and epimerase 
activity, energy metabolism 






yes 


BRF2** 


8 


37.8 


1 .20E-05 


0.004 


0.003 


transcription factor regulating 
nucleic acid metabolism 








RAB11FIP1 


8 


37.8 


7.77E-04 


0.620 


0.250 


GTPase-activating protein 
involved in signal transduction 








ASH2L** 


8 


38.0 


5.88E-06 


0.036 


0.002 


DNA-binding protein involved in 
nucleic acid metabolism 








L5M) 


8 


38.0 


6.79E-06 


0.300 


0.130 


RNA-binding protein involved in 
nucleic acid metabolism 


Fraser et al. ( 2005; . 
Takahashi et aL 2002 






BAG4 


8 


38.1 


8.73E-07 


0.330 


0.063 


BCL2-associated chaperone 
protein involved in apoptosis 


Gehrmann et aL 2005 






DDHD2** 


8 


38.1 


4.40E-06 


0.008 


0.006 


phospholipase involved in 
energy metabolism 








WHSC1L1 


8 


38.2 


9.04E-06 


0.760 


0.730 


nucleic acid binding 








FGFRT* 


8 


38.3 


1 .04E-04 


0.025 


0.540 


receptor tyrosine kinase 
involved in signal 
transduction 


Braun and Shannon, 
2004; Ray et aL 2004 




yes/ 
PD 173074 


TACC1** 


8 


38.7 


6.72E-03 


0.020 


0.043 


cell cycle control protein 
associated with signal 
transduction 


Still et aL 1999 


44.1/Plekha2 




ADAM9 


8 


38.9 


1.91E-04 


0.930 


0.960 


metalloproteinase associated 
with protein metabolism 


Mazzocca et aL 2005 


75/Plekha2 


yes 


GOLGA7 


8 


41.4 


7.10E-05 


0.140- 


0.170 


integral membrane protein 
associated with transport 








SLD5 


8 


41.4 


1.41E-03 


0.780 


0.460 


unknown 








MYST3** 


8 


41.8 


5.74E-05 


0.006 


0.022 


transcription-regulatory 
protein involved in nucleic 
acid metabolism 








AP3M2** 


8 


42.0 


4.43E-05 


0.038 


0.220 


adapter, protein associated 
with transport 








IKBKB** 


8 


42.1 


7.73E-05 


0.002 


0.002 


serine/threonine kinase 
associated with 
signal transduction 


Greten and Karin, 2004; 
Lam et aL 2005 


3.1/AK018683 


yes/ 
PS- 1145 


POLB** 


8 


42.2 


2.15E-04 


0.001 


0.008 


DNA polymerase involved in 
nucleic acid metabolism 


Clairmont et aL 1 999 


70. 1/AK0 18683 




VDAC3** 


8 


42.3 


9.93E-05 


0.056 


0.290 


voltage-dependent anion 
channel associated 
with transport 








SLC20A2 


8 


42.3 


1 .98E-03 


0.170 


0.240 


membrane transport protein 








THAP1** 


8 


42.7 


7.13E-03 


0.190 


0.097 


unknown 








FNTA** 


8 


42.9 


3.13E-03 


0.067 


0.370 


prenyltransferase associated 
with protein metabolism 






yes 


LOC441347 


■ 8 


43.0 


7.77E-04 


0.180 


0.810 


unknown 








CCND1 




69.2 


1 .50E-06 


0.560 

t 


0.770 


cell cycle control protein 
involved in signal transduction 


Hinds et aL 1994 


0.4/Fgf3 




FGF3 




69.4 


1 .84E-03 


0.920 


0.420 


growth factor involved in signal 
transduction 


Okunieff et aL 2003 






FADD 




70.0 


7.42E-03 


0.200 


0.250 


adapter molecule associated 
with signal transduction 








PPFIA1 


n 


70.0 


1 .53E-05 


0.670 


0.550 


anchor protein associated with 
cell growth and/or maintenance 








cttn* 


„ 


70.0 


2.69E-04 


0.450 


0.100 


cytoskeletal protein associated with 
cell growth and/or maintenance 








NADSYN1 




70.9 


3.42E-04 


0.290 


0.990 


unknown 








KRTAP5-9* 




71,0 


3.72E-03 


0.035 


0.050 


cytoskeletal protein associated 
with cell growth and/or 
maintenance 








FOLR3 




71.6 


1 .54E-03 


0.730 


0.490 


cell surface receptor associated 
with signal transduction 








NEU3 




74.4 


9.73E-03 


0.460 


0.370 


neuraminidase associated with 
protein metabolism 








N-PAC" 




75.8 


4.39E-03 


0.110 


0.038 


protein kinase 








LHX1* 


17 


35.5 


1.41E-03 


0.250 


0.018 


transcription factor associated 









with nucleic acid metabolism 



536 



CANCER CELL DECEMBER 2006 



ARTICLE 



Table 3. Continued 











p value, 




















disease 


p value, 






Kbp to site 










p value. 


free 


distant 




Cancer function 


of viral 




Gene 


Ch 


Mbp 


amplification 


survival 


recurrence 


Transcript description 


reference 


integration 


Druggable? 


ACACA 


17 


35.6 


8.24E-03 


0.850 


0.850 


carboxylase associated 

with pnprnv inptcihrklkm 

Willi CI Ivly )f 1 1 IClULrwIOl 1 1 






yes 


DDX52 


17 


36.2 


3.47E-04 


0.300 


0.560 


RNA-binding protein associated 
with nucleic acid metabolism 








TBC 1 D3 


17 


36.7 




0.1 70 


0.170 


vl IM 1 \J YV 1 | 








SOCS7 


. 17 


36.9 


4.00E-03 


0.450 


0.600 


adapter molecule associated 

with <;innnl trorKHi iftinn 

Will 1 diyi IUI IIVJJ OVJULIIUI 1 








PCGF2 


17 


37.3 


3.10E-04 


0.760 


0.850 


associated with nucleic acid 
metabolism 




c All ncn 1 






17 


77 7 
Of .o 


ft ni f.o7 

O.U 1 c-uo 


n 790 
u.oyu 


O AIO 
U.O 1 u 


uuiquiTin proTeasome sysTem 
protein associated with 

|J(UICII 1 1 1 Id UUUIISI 1 1 




z4.4/LaSp 1 




PIP5K2B 


17 


37.3 


5.07E-03 


0.400 


0.380 


lipid kinase associated with 
signal transduction 




47.5/Laspl 




FL J2029 1 


17 


37.3 


7 1AF-07 


n ft^n 

U.OJU 


O 990 

U.7 ZU 


ui iK.r luwn 




/z.4/Lasp 1 




PPARBP* 


17 


37.9 


2.13E-04 


0.089 


0.260 


transcription-regulatory protein 

**iccrt^*i/*itoH \A/ith ci^in^i 1 
Udduisiuieu wmii ^lynui 

transduction 


Zhu et al.. 2000 






CTAPD3 


1 7 


38.2 


7 40F-09 
o.*tuc-U7 


n 49n 


0 R90 
u.ozu 


iiiinjdnjritjiiui i^unitJi protein 

associated with transport 




Oz. 1 fLuJu \ Qo 




TCAP 


17 


38.2 


1.26E-05 


0.640 


0.700 


structural protein associated 
with cell growth and/or 
maintenance 




23.1/Znfnla3 




PNMT* 


1 7 


7ft 9 


9 09P-OA 


n A7n 
u.oou 


o nio 

U.U 1 u 


methyltransferase associated 
with metabolism and energy 




zl.l/zntnlao 


yes 


PERLD1 


17 


38.2 


3.41 E-09 


0.930 


0.840 


membrane protein of unknown 
function 




18.2/2nfnla3 




FPRR9 
CrsODZ 


17 


oo.z 


O.** 1 C U7 


n l in 

U. 1 1 U 


O ^AO 

u.oou 


receptor Tyrosine Kinase 
associated with signal 

iiui ibuuc nun 


oiamon ex ai., iror 




yes/ 
trastuzumab, 
lapatinib 


GRB7* 


17 


38.3 


7.28E-08 


0.044 


0.300 


adapter molecule associated 

\ A/!+ hi c\/"^f\/~i\ irvirtc-li i^tli^n 

wiiri iiyriui rransuucnon 


Tanaka et al., 2000 


10.8/Znfnla3 






17 


38.4 


8 7.6F-06 

O.OQCMJO 


0.710 


0.690 


UI IM IUWI 1 




4o.o/i.nTn iao 




PSMD3 


17 


38.5 


4.25E-03 


0.250 


0.510 


i ihini litin nrntA<n^r^mA tmtpm 
uuiujuuu i j-ji \j icujui i ic j i ci 1 1 

protein associated with 
pruieiri inerauQiisrn 




79 ft/7nfnl <^7 

oz. 0/z.nin i qo 




NR1D1 


17 


38.6 


1.28E-03 


0.210 


0.750 


nuclear receptor associated 
with signal transduction 




73.4/Cdc6 


yes 


ZNF217 


20 


52.9 


5.02E-06 


0.650 


0.650 


transcription factor associated 

\ a /It Hi cinrt/-il ♦lYinez-Ti i/"-4i/"\n 

wiin iiyrnji iiunsuuouun 


Nonet etal., 2001 


39.3/Zfp217 






on 
zu 


^7 9 
OO.Z 


a 97F_n7 


n 9on 

U.ZYU 


O 1 40 
U. 1 **\J 


un Known 




/u.y/zpfzi 7 




CSTF1 


20 


55.7 


7.15E-03 


0.150 


0.330 


pre-mRNA processing 








RAE1 


zu 


56 6 




0 360 


0 420 


r\iNA\-uinuii iy pruiein ussociaiea 


DUDU ei UI., ZUUJ 






RNPC1 


20 


56.6 


1.19E-03 


0.750 


0.830 


MN/A VJW lull iy \J\\J\XS\\ \ USdU^IUICU 

with nucleic acid metabolism 








PCK1 


20 


56.8 


9.78E-03 


0.250 


0.330 


riho^r^hntrnn^f pr-k^ nt^of*ii^tort 

with energy and metabolism 








TMEPAI* 


90 


56.9 


1.21E-04 


U.UOO 


0.077 


i ink"nr\\A/n 
udiuiywii 








RAB22A 


20 


57.6 


3.15E-05 


0.990 


0.340 


GTPase associated with signal 
transduction 








VAPR 
VnrD 


90 
zu 


17 a 


7 7AF-OS. 
O./ OC~UO 


n 7xn 
u.oou 


O 9 AO 
u.zou 


intsrnuiunc iiuribpuii proTein 








STX16 


20 


57.9 


2.63E-05 


0.220 


0.790 


transport/cargo protein 








NPEPL1 


20 


57.9 


3.35E-05 


0.270 


0.800 


aminopeptidase associated 
with protein metabolism 








GNAS*' 


20 


58.1 


6.60E-03 


0.052 


0.058 


G protein associated with signal 
transduction 








TH1L 


20 


58.2 


1.14E-04 


0.530 


0.800 


transcription-regulatory protein 
associated with nucleic acid 
metabolism 




36.7/Thil 




C20orf45 


20 


58.3 


6.29E-04 


0.970 


0.790 


unknown 




88.7Ahll 





Functional annotation was based on the Human Protein Reference Database (http://hprd.org/). Genes marked with an asterisk are associated with reduced 
survival duration or distant recurrence when overexpressed in nonamplifying tumors. Genes marked with two asterisks are significantly associated with re- 
duced survival duration or distant recurrence (p < 0.05) when downregulated in nonamplifying tumors. Distances to sites of recurrent viral integration were 
determined from published information (Akagi et al., 2004). The last column identifies genes that have predicted protein folding characteristics that suggest 
that they might be druggable (Russ and Lampel. 2005). 
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Expression subtype: | Basal, IERBB2, iLumA, ILumB, j Norm-like 
Expression level: - 1 . 50 HHHIUHi i .50 

Figure 4. Results of unsupervised hierarchical clustering of 130 breast tumors 
using intrinsically variable gene expression but excluding any transcripts 
whose levels were significantly associated with genome copy number 
Red indicates increased expression, and green indicates reduced expres- 
sion. An annotated version is provided as Figure S3. 

metabolism in the region of amplification at 8p1 1 -1 2 was espe- 
cially strong. Interestingly, the region of recurrent amplification 
at 8p11-12 described above was reduced in copy number in 
some tumors, and this event also was associated with poor out- 
come. This raises the possibility that poor clinical outcome in 
tumors with 8p1 1 -1 2 abnormalities is due to increased genome 
instability/mutagenesis resulting from either up- or downregula- 
tion of genes encoded in this region. This concept is supported 
by studies in yeast showing that up- or downregulation of genes 
involved in chromosome integrity and segregation can produce 
similar instability phenotypes (Ouspenski et al. f 1999). 

Therapeutic targets 

The 66 genes we found to be deregulated by the high-level 
amplifications associated with poor outcome are particularly in- 
teresting as therapeutic targets for treatment of patients that are 
refractory to current therapies. Small-molecule or antibody- 
based inhibitors have already been developed for FGFR1 
(PD1 73074; Ray et al. f 2004), IKBKB (PS-1145; Lam et al., . 
2005), and ERBB2 (Trastuzumab; Vogel et al., 2002), and six 
others (PROCC, ADAM9, FNTA, ACACA, PNMT, and NR1D1) 
are considered to be druggable based on the presence of 
predicted protein folds that favor interactions with drug-like 
compounds (Russ and Lampel, 2005). Taking ERBB2 as the 
paradigm (recurrently amplified, overexpressed, associated 
with outcome and with demonstrated functional importance in 



cancer) suggests FGFR1, TACC1, ADAM9, IKBKB, PNMT, and 
GRB7 as high-priority therapeutic targets in these regions of 
amplification. 

Experimental procedures 
Tumor characteristics 

Frozen tissue from UC San Francisco and the California Pacific Medical Cen- 
ter collected between 1989 and 1997 was used for this study. Tissues were 
collected under IRB-approved protocols with patient consent. Tissues were 
collected, frozen over dry ice within 20 min of resection, and stored at -80° C. 
An H&E section of each tumor sample was reviewed, and the frozen block 
was manually trimmed to remove normal and necrotic tissue from the periph- 
ery. Clinical follow-up was available with a median time of 6,6 years overall 
and 8 years for censored patients. Tumors were predominantly early stage 
(83% stage I and II) with an average diameter of 2.6 cm. About half of the 
tumors were node positive, 67% were estrogen receptor positive, 60% re- 
ceived tamoxifen, and half received adjuvant chemotherapy (typically adria- 
mycin and Cytoxan). Clinical characteristics of the individual tumors are 
provided together with expression and array CGH profiles in the CaBIG re- 
pository and at http://cancer.lbl.gov/breastcancer/data.php. 

Array CGH 

Each sample was analyzed using Scanning and OncoBAC arrays. Scanning 
arrays were comprised of 2464 BACs selected at approximately megabase 
intervals along the genome as described previously (Hodgson et al., 2001 ; 
Snijders et al., 2001). OncoBAC arrays were comprised of 960 P1, PAC, or 
BAC clones. About three-quarters of the clones on the OncoBAC arrays 
contained genes and STSs implicated in cancer development or progres- 
sion. All clones were printed in quadruplicate. DNA samples for array CGH 
were labeled generally as described previously (Hackett et al., 2003; Hodg- 
son et al., 2001; Snijders et al., 2001). Briefly, 500 ng each of cancer and 
normal female genomic DNA sample was labeled by random priming with 
CY3- and CY6-dUTP, respectively; denatured; and hybridized with unlabeled 
Cot-1 DNA to CGH arrays. After hybridization, the slides were washed and 
imaged using a 16-bit CCD camera through CY3, CY5, and DAPI filters (Pin- 
kel et al., 1998). 

Expression profiling 

Expression profiling was accomplished using the Affymetrix High Through- 
put Array (HTA) GeneChip system, in which target preparations, washing, 
and staining were carried out in a 96-well format. Detailed methods are de- 
scribed in the Supplemental Data. 

Statistical considerations 
Data processing 

Array CGH data image analyses were performed as described previously 
(Jain et al M 2002). In this process, an array probe was assigned a missing 
value for an array if there were fewer than two valid replicates or the standard 
deviation of the replicates exceeded 0.2. Array probes missing in more than 
50% of samples in OncoBAC or scanning array data sets were excluded in 
the corresponding set. Array probes representing the same DNA sequence 
were averaged within each data set arid then between the two data sets. Fi- 
nally, the two data sets were combined, and the array probes missing in more 
than 25% of the samples, unmapped array probes, and probes mapped to 
chromosome Y were eliminated. The final data set contained 2149 unique 
probes. For Affymetrix data, multichip robust normalization was performed 
using RMA software (Irizarry et al., 2003). Transcripts assessed on the arrays 
were classified into two groups using Gaussian model-based clustering by 
considering the joint distribution of the median and standard deviation of 
each probe set across samples. During this process, computational de- 
mands were reduced by randomly sampling and clustering 2000 probe inten- 
sities using mclust (Yeung et al., 2001, 2004) with two clusters and unequal 
variance. Next, the remaining probe intensities were classified into the newly 
created clusters using linear discriminant analysis. The cluster containing 
probe intensities with smaller mean and variance was defined as "not ex- 
pressed," and the second cluster was defined as "expressed." 
Characterizing copy number changes 

The sample profiles were segmented into the levels of equal copy number 
common to the whole genome, and the copy number transitions, 
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amplifications, and frequency of alterations were determined using previ- 
ously described methodologies (Snijders et al., 2003; Fridlyand et al. t 
2006). The detailed approaches are described in the Supplemental Data. 
Clustering of genome copy number profiles 

Genome copy number profiles were clustered using smoothed imputed data 
with outliers present. Agglomerative hierarchical clustering with Pearson's 
correlation as a similarity measure and the Ward method to minimize sum 
of variances were used to produce compact spherical clusters (Hartigan, 
1975). The number of groups was assessed qualitatively by considering 
the shape of the clustering dendogram. 
Expression subtype assignment 

Tumors were classified according to expression phenotype (basal, ERBB2, 
luminal A, luminal B, and normal-like) by assigning each tumor to the subtype 
of the cluster defined by hierarchical clustering of expression profiles for 122 
samples published by Sorlie et al. (2003) to which it had the highest Pearson's 
correlation. The correlation was computed using the subset of Stanford 
intrinsically variable genes common to both data sets. For details, refer to 
the Supplemental Data. 
Association of copy number with survival 

Stage 4 samples were excluded from all the outcome-related analyses, and 
disease-specific survival and time to distant recurrence were used as the two 
endpoints. Significance of the standardized regression coefficient Cox-pro- 
portional model was used to determine clinical (univariate and multivariate 
analyses) and genomic variables (individual clones, instability summary mea- 
sures, and recurrent amplicon status) associated with outcome, p values for 
individual clones were adjusted using FDR. The significance was declared at 
p < 0.05. For details, see the Supplemental Data. 
Association of copy number with expression 

The presence of an overall dosage effect was assessed by subdividing each 
chromosomal arm into nonoverlapping 20 Mb bins and computing the aver- 
age of cross-Pearson's-correlations for all gene transcript-BAC probe pairs 
that mapped to that bin. We also calculated Pearson's correlations and cor- 
responding p values between expression level and copy number for each 
gene transcript. Each transcript was assigned an observed copy number 
of the nearest mapped BAC array probe. Eighty percent of gene transcripts 
had a nearest clone within 1 Mbp, and 50% had a clone within 400 Kbp. Cor- 
relation between expression and copy number was only computed for the 
gene transcripts whose absolute assigned copy number exceeded 0.2 in 
at least five samples. This was done to avoid spurious correlations in the 
absence of real copy number changes. We used conservative Holm p value 
adjustment to correct for multiple testing. Gene transcripts with an adjusted 
p value <0.05 were considered to have expression levels that were highly 
significantly affected by gene dosage. This corresponded to a minimum 
Pearson's correlation of 0.44. 

Associations of transcription and CNA in regions of amplification 
with outcome in tumors without particular amplicons 
We assessed the associations of levels of transcripts in regions of amplifica- 
tions with survival or distant recurrence in tumors without amplifications in or- 
der to find genes that might contribute to progression when deregulated by 
mechanisms other than amplification (e.g., we assessed associations be- 
tween expression levels of the genes mapping to the 8p11-12 amplicon 
and survival in samples without 8p11 -12 amplification). We performed sepa- 
rate Cox-proportional regressions for disease-specific survival and distant 
recurrence. Stage 4 samples were excluded from all analyses. 
Testing for functional enrichment 

We used the gene ontology statistics tool GoStat (Beissbarth and Speed, 
2004) to test whether the gene transcripts with the strongest dosage effects 
were enriched for particular functional groups. The p values were adjusted 
using false discovery rate. The categories were considered significantly over- 
represented if the FDR-adjusted p value was less than 0.001. Since ex- 
pressed genes were significantly more likely to show dosage effects than 
nonexpressed genes (p value < 2.2E-16, Wilcoxon rank-sum test), GoStat 
comparisons were performed only for expressed genes. Specifically, GO 
categories for 1734 expressed probes with significant dosage effect (Holm 
p value < 0.05) were compared with those for 3026 expressed probes with 
no dosage effect (Pearson's correlation < 0.1). 

Microarray data 

The raw data for expression profiling are available at ArrayExpress (http:// 
www.ebi.ac.uk/arrayexpress/) with accession number E-TABM-158. 



Clinical characteristics of the individual tumors as well as array CGH and ex- 
pression profiles are available in the CaBIG repository (http://caarraydb.nci.nih. 
gov/caarray/publicExperimentDetailAction.do?expld=1 01 5897589973255), at 
http://cancer.lbl.gov/breastcancer/data.php, and in the Supplemental Data. 

Supplemental data 

The Supplemental Data include Supplemental Experimental Procedures, 
three supplemental figures, and three supplemental tables and can be found 
with this article online at http://www.cancercell.0rg/cgi/content/full/IO/6/ 
529/DC1/. 
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